Hidden Dynamic Models for Speech Processing Applications

نویسندگان

  • Leo Jingyu Lee
  • Leo J. Lee
چکیده

Human speech has a dual nature: the goal of speech is to convey discrete linguistic symbols corresponding to the intended message while the actual speech signal is produced by the continuous and smooth movement of the articulators with rich temporal structures. Such a dual nature has been amazingly utilized by humans in a beneficial way but has presented a big challenge for both speech science and speech technology. This thesis starts with the observation that the continuous or dynamic aspect of human speech is inadequately modeled in current speech technology, especially in state-of-the-art speech recognition systems, while much could be learned from recent advances in speech science. This motivates a study of articulatory dynamics, based on a recently available large scale speech production database that provides simultaneous acoustic and articulatory measurements. Indeed many insights and valuable experiences have been gained from such a study and, as a result, a hidden dynamic model (HDM) that gracefully integrates the discrete and continuous nature of speech is proposed. But it also turns out that articulatory dynamics is highly complicated and can not be captured by simple models, thus the dynamics are very difficult to put into an efficient computational framework for use in speech technology. As a continuing effort to seek internal dynamics of human speech that can reflect the continuous shape change of the vocal tract and benefit the current speech technology, the second part of the thesis turns to a study of vocal-tract-resonance (VTR) dynamics, built upon the insights and experiences gained from studying articulatory dynamics. It verifies that VTR dynamics can be captured by simple dynamic equations, and a highly accurate and efficient piecewise linear mapping from VTR dynamics to the acoustic space is also carefully designed. Two novel VTR tracking methods are developed in this part: one is based on mimicking manual tracking of VTR dynamics by human experts and uses advanced image processing methods (active contours), the other is the natural outcome of formulating a HDM for VTR dynamics and recovering the hidden dynamics by Kalman smoothing. The residual feature resulting from VTR tracking by HDM has also been used as an appended acoustic feature to improve a hidden Markov model (HMM) based phone recognizer on the TIMIT database. The final part of the thesis is dedicated to arguably the most difficult and compreiv hensive speech processing application: automatic speech recognition (ASR). It first casts the HDM formulated for speech application under the general framework of probabilistic graphical models in machine learning. However, it also becomes clear that exact inference and parameter learning for such a model is NP hard. In order to use HDM for speech recognition, this final part concentrates on developing novel and powerful variational EM algorithms. The effectiveness of the new algorithms invented has been demonstrated by extensive simulation experiments, and special concerns for speech recognition are also discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

An Empirical Exploration of Hidden Markov Models: From Spelling Recognition to Speech Recognition

Hidden Markov models play a critical role in the modelling and problem solving of important AI tasks such as speech recognition and natural language processing. However, the students often have difficulty in understanding the essence and applications of Hidden Markov models in the context of a cursory introductory coverage of the subject. In this paper, we describe an empirical approach to expl...

متن کامل

Hidden Markov Random Fields

A noninvertible function of a first order Markov process, or of a nearestneighbor Markov random field, is called a hidden Markov model. Hidden Markov models are generally not Markovian. In fact, they may have complex and long range interactions, which is largely the reason for their utility. Applications include signal and image processing, speech recognition, and biological modeling. We show t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004